Skip to content

Conversation

@niaow
Copy link
Member

@niaow niaow commented Jun 21, 2020

This replaces #1039. It should be a lot more efficient. When inspecting the heap while running testdata/gc.go, I found that there were typically only 3 free spans at any given time. I think this should substantially improve our situation with fragmentation.

@niaow
Copy link
Member Author

niaow commented Jun 22, 2020

I ran testdata/gc.go on cortex-m-qemu, printing the free span tree after every GC cycle:

free spans:
  1x 400 bytes: 0x2000fe70
  1x 1856 bytes: 0x2000f480
  1x 54976 bytes: 0x20001850
free spans:
  1x 1632 bytes: 0x2000dd90
  1x 5120 bytes: 0x2000ec00
  1x 49488 bytes: 0x20001850
free spans:
  1x 976 bytes: 0x2000d350
  1x 1680 bytes: 0x2000bfc0
  1x 1872 bytes: 0x2000c920
  1x 10096 bytes: 0x2000d890
  1x 42640 bytes: 0x20001850
free spans:
  1x 2240 bytes: 0x2000b080
  1x 2560 bytes: 0x2000a4d0
  1x 17040 bytes: 0x2000bd70
  1x 34992 bytes: 0x20001850
free spans:
  1x 976 bytes: 0x20009720
  1x 24992 bytes: 0x20009e60
  1x 31712 bytes: 0x20001850
free spans:
  1x 16 bytes: 0x2000fff0
  1x 1824 bytes: 0x200089d0
  1x 27872 bytes: 0x20009300
  1x 28928 bytes: 0x20001850
free spans:
  1x 1792 bytes: 0x20007bf0
  1x 24688 bytes: 0x20001850
  1x 30704 bytes: 0x20008810
free spans:
  1x 384 bytes: 0x2000fae0
  1x 6656 bytes: 0x2000dd60
  1x 17904 bytes: 0x200093f0
  1x 30736 bytes: 0x20001850
free spans:
  1x 1808 bytes: 0x20007f30
  1x 26320 bytes: 0x20001850
  1x 29344 bytes: 0x20008d60
ok

It would however be nice to test this with a more realistic workload.

@aykevl
Copy link
Member

aykevl commented Jun 23, 2020

Skimming through the diff in code size, it looks like there is a code size increase of about 300 bytes in most cases (source: https://gist.github.com/aykevl/eb1239dc0290fe79251e5fd7995b437a). This cost should be considered. It may be worthwhile still but such an increase can be a problem on small chips (especially something like AVR, that will likely have a bigger increase because of the less compact instruction set).

I would like to hear other views on this. Even though ~300 bytes may not sound like much, many of these changes quickly add up, especially for a feature that may not be necessary for many programs. It may be an idea to see whether it's possible to somehow split the GC into two: one baseline GC (what we have right now) and one that uses roughly the same (conservative) algorithm but with optimizations like this PR provides, for more GC-heavy programs.

@niaow
Copy link
Member Author

niaow commented Jun 23, 2020

especially something like AVR

I think this is more important on AVR than it is on other instruction sets. This isn't a performance optimization, this is a reliability change.

However, I think we might actually be better off with an entirely separate collector for very tiny heaps, where we use a small compile-time-sized array of words for the metadata. This could make the collector a lot smaller. I can try this out in a separate PR. This would not be the same algorithm; it would be simpler.

@niaow niaow added this to the v0.15 milestone Jun 27, 2020
@deadprogram
Copy link
Member

Same thing on this PR, curious about the status @niaow ?

@niaow
Copy link
Member Author

niaow commented Aug 25, 2020

I am abandoning this.

@niaow niaow closed this Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants